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Abstract 

In  this  paper,  we  develop  a  method  for  solving  a  large  class  of  non-convex  Hamilton-Jacobi  partial 
differential  equations  (HJ  PDE).  The  method  yields  decoupled  subproblems,  which  can  be  solved  in 
an  embarrassingly  parallel  fashion.  The  complexity  of  the  resulting  algorithm  is  polynomial  in  the 
problem  dimension;  hence,  it  overcomes  the  curse  of  dimensionality  [1,  2].  We  extend  previous  work  in 
[6]  and  apply  the  Hopf  formula  to  solve  HJ  PDE  involving  non-convex  Hamiltonians.  We  propose  an 
ADMM  approach  for  finding  the  minimizer  associated  with  the  Hopf  formula.  Some  explicit  formulae 
of  proximal  maps,  as  well  as  newly-defined  stretch  operators,  are  used  in  the  numerical  solutions  of 
ADMM  subproblems.  Our  approach  is  expected  to  have  wide  applications  in  continuous  dynamic 
games,  control  theory  problems,  and  elsewhere. 
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1  Introduction 

It  is  well  known  that  Hamilton- Jacobi-Isaacs  partial  differential  equations  (HJ  PDE)  play  a  very  im¬ 
portant  role  in  analyzing  continuous/differential  dynamic  games,  control  theory  problems,  and  dynamical 
systems  coming  from  the  physical  world,  e.g.  [11].  An  important  application  is  to  compute  the  evolution 
of  geometric  objects  [25],  which  was  first  used  for  reachability  problems  in  [21,  22]  to  our  knowledge. 

Numerical  solutions  to  HJ  PDE  have  attracted  a  lot  of  attention.  Most  of  the  methods  involve  the 
introduction  of  a  grid  and  a  finite  difference  discretization  of  the  Hamiltonian.  Some  of  these  well-known 
methods  using  discretization  include  ENO/WENO-type  methods  [24]  and  Dijkstra-type  [9]  methods 
such  as  fast  marching  [31]  and  fast  sweeping  [30].  However,  owing  to  the  discretization  nature,  these 
numerical  approaches  of  HJ  PDE  suffer  from  poor  scaling  with  respect  to  dimension,  hence  rendering 
them  impossible  to  be  applied  to  problems  in  high  dimensions. 

Research  has  therefore  been  conducted  by  several  groups  in  search  of  possible  algorithms  that  can 
scale  reasonably  with  dimension.  Some  new  algorithms  are  introduced  in  e.g.  [7,  18,  19].  In  [6],  the 
authors  proposed  a  causality-free  method  for  solving  convex  HJ  PDE  based  on  the  Hopf-Lax  formula. 
An  HJ  PDE  is  called  convex  if  the  Hamiltonian  (in  (2.1)  below)  is  convex  with  respect  to  the  solution 
gradient.  Using  the  Hopf-Lax  formula,  the  PDE  becomes  decoupled  and  the  solution  at  each  point  can 
be  effectively  calculated  by  very  easy,  d-dimensional  minimization. 

In  this  paper,  we  propose  to  extend  the  method  in  [6]  and  apply  the  classical  Hopf  formula  to  solve 
non-convex  HJ  PDE,  where  the  Hamiltonian  is  no  longer  convex.  In  our  proposed  method,  the  solution 
to  the  HJ  PDE  is  evaluated  at  each  point  by  numerical  minimization  (in  the  same  dimension  as  the 
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dimension  at  the  HJ  PDE)  described  by  the  Hopf  formula  at  that  point.  This  way,  evaluating  the 
solution  at  different  points  are  decoupled  into  independent,  easy  subproblems. 

Specifically,  we  propose  to  apply  a  nonconvex  ADMM  algorithm  (c.f.  [16,  34])  to  the  minimization 
at  each  point,  and  a  proximal  map  is  encountered  as  an  ADMM  subproblem.  Some  well-known  methods 
for  evaluating  proximal  maps  of  convex  functionals,  as  well  as  some  non-convex  ones,  are  discussed. 
We  introduce  modified  proximal  maps  that  we  call  the  stretch  operator,  which  help  eliminate  possible 
instability  when  the  original  proximal  maps  are  multi-valued.  This  modification  improves  numerical 
stability  of  our  ADMM  iterations.  We  have  means  of  fast  evaluation  of  several  proximal  maps  inside  each 
ADMM  iteration. 

Each  minimization  has  a  low  complexity,  growing  polynomially  in  the  dimension  of  the  HJ  PDE.  More¬ 
over,  the  solution  at  each  point  does  not  have  any  numerical  error  due  to  finite  difference  approximation. 
The  error  in  the  solution  can  be  arbitrarily  small  depending  on  when  we  stop  the  ADMM  iterations. 
Hence,  the  Hopf  formula  can  be  used  for  rapid  numerical  solutions  for  a  non-convex  Hamiltonian  in  high 
dimensions,  and  this  method  is  expected  to  have  wide  applications  in  optimal  control  and  differential 
games,  where  a  non-convex  Hamiltonian  arises  naturally. 

The  rest  of  our  paper  is  organized  as  follows:  In  Section  2,  we  briefly  review  the  HJ  PDE  with  initial 
data,  as  well  as  the  Hopf-Lax  formula.  Then  in  Section  3,  we  provide  a  brief  review  of  differential  games, 
following  [11]  and  references  therein,  and  explain  its  connection  with  non-convex  HJ  PDE.  In  Section  4,  we 
present  our  optimization-based  algorithm.  Numerical  experiments  in  Section  5  illustrate  the  effectiveness 
of  our  algorithm  in  solving  the  HJ  PDE. 

2  Hamilton-Jacobi  Equations  and  Hopf-Lax  Formulae 

In  this  work,  we  are  concerned  with  the  numerical  approximation  scheme  for  solving  the  following  HJ 
PDE: 


^<p(x,t)  +  H(Vxip(x,  t))  =  0  in  x  (0,  oo) ,  (2.1) 

where  H  :  — >  R.  is  a  continuous  Hamiltonian  function  bounded  from  below  by  an  affine  function,  -^tp 
and  respectively  denote  the  partial  derivatives  with  respect  to  t  and  the  gradient  vector  with  respect 
to  x  of  the  function  ip  :  Rd  x  (0,  oo)  —>  R.  We  are  also  given  the  initial  data 

tp(x,  0)  =  J(x)  in  Rd .  (2.2) 

For  the  sake  of  simplicity,  we  only  consider  functions  H  and  J  that  are  finite  everywhere.  Results 
presented  in  this  paper  can  be  generalized  to  functions  with  the  extended  value  +00  under  suitable 
assumptions.  We  wish  to  compute  the  viscosity  solution  to  (2.1)-(2.2)  [3,  4]  at  a  given  point  x  €  Rd  and 
time  t  £  (0, 00). 

The  viscosity  solution  to  (2.1)-(2.2)  is  explicitly  given  by  the  Hopf-Lax  formulae,  which  hold  because 
the  integral  curves  of  the  Hamiltonian  vector  field  (i.e.,  the  bi-characteristics  in  the  phase  space)  give 
straight  line  characteristics  when  it  is  projected  to  the  axspace. 

In  the  case  where  J  is  convex  and  1-coercive,  the  solution  ip  to  the  system  (2.1)-(2.2)  is  given  by  the 
following  classical  Hopf  formula  [17,  10]: 

<p(x,  t)  :=  —  mini  J*(v)  +  tH(v)  —  ( x ,  u)}  ,  (2-3) 

veRd 

where  J*  :  Rd  — >  RU{+oo}  is  the  Fenchel-Legendre  transform  of  a  convex,  proper,  lower  semi-continuous 
function  {+00}  (cf.  [27]): 

J*(v)  :=  sup  {(v,x)  —  J(x)} .  (2.4) 
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On  the  other  hand,  in  the  case  where  H  is  convex  and  1-coercive,  the  solution  p  to  the  system  (2.1)-(2.2) 
is  given  by  the  following  Lax  formula  [10]: 


p{x,  t)  =  min  <  J(y)  +  tH 

ye Rd  [ 


*  (  x-y 
t 


(2.5) 


These  two  formulae  extend  to  time  dependent  Hamiltonians  as  follows.  Consider  the  following  HJ  PDE: 
d 


dt 


p(x,t)  +  H(t,  'Vx<p(x,t))  =  0  in  x  (0,  oo) ,  p(x,  0)  =  J(x)  in 


Its  viscosity  solution  p  is  given  explicitly  by  the  generalized  Hopf  formula  [10,  17]  as 

p(x,  t)  =  —  min  /  J*(v)  +  [  H(s,v)ds  —  (x,v)\  . 

veR'i  l  Jo  J 


(2.6) 


(2.7) 


Therefore,  all  the  methods  and  algorithms  that  we  discuss  in  this  work  extend  to  the  time-dependent 
case  provided  that  the  integral  in  the  above  minimization  problem  can  be  efficiently  evaluated. 


3  Review  of  Differential  Games  and  its  Connection  with  Non- 
convex  Hamilton- Jacobi  Equations 

In  this  section,  we  provide  a  brief  review  of  differential  games.  We  follow  the  results,  notation, 
and  exposition  in  [11]  and  the  references  therein,  in  which  differential  games  are  connected  to  viscosity 
solutions  of  possibly  non-convex  HJ  PDE  by  explicit  representation  formulae. 

We  begin  with  a  system  of  differential  equations  given  as  follows.  Fix  0  <  t  <  T,  x  £  Rd.  We  consider 

=  f{s,x(s),a(s),b(s))  t  <  s  <  T , 

\z(*)  =  x, 

where  the  functions 


a  :  [t,T]  — ►  A 

b  :  [t,  T]  — »  B 

are  given  measurable  functions  that  we  call  the  controls  employed  by  players  I  and  II,  and  A  cM.k,B  CM1 
are  given  compact  sets.  In  what  follows,  we  assume  that  the  function 

/  :  [0,  T\  x  x  1  x  B  -) 

is  uniformly  continuous  and 

| \f(t,x,a,b)\  <  Ci 

[\ f(t,x,a,b)  -  f(t,y,a,b)\  <  Ci\x  -  y\ , 

for  some  constant  Ci  and  for  all  0  <  t  <  T,  x,  y  €  Rm,  a  €  A,  b  G  B.  The  unique  solution  to  (3.1)  is 
referred  to  as  the  response  of  the  controls  a(-),  &(•).  Next,  we  introduce  the  payoff  functional  for  a  given 
pair  of  (x,  t ): 

P(a,  b)  :=  Pt,x(a(-),  &(■))  :=  J ^  h(s,  x(s),  a(s),b{s))  ds  +  g(x(T)) , 
where  g  :  — X  K.  satisfies 

f  lfl(*)l  <  c2 

1 1 g(x)  -  g{y)\  <  C2\x-y\ , 
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and  h  satisfies 


I  \h(t,  x,  a,  6)|  <  C3 

\| h(t,x,a,b)  -  h(t,y,a,b)\  <  C3\x  -  y\ , 

for  some  constants  62,63  and  all  0  <  t  <  T,  x,  y  £  Rm,  a  £  A,  b  £  B.  Now  in  a  differential  game,  the 
goal  of  player  I  is  to  maximize  the  functional  P  whereas  that  of  player  II  is  to  minimize  P. 

Next,  we  define  the  lower  and  upper  values  of  the  differential  game,  based  on  the  notation  introduced 
above.  We  first  define  the  two  sets  containing  the  respective  controls  of  players  I  and  II: 

M(t)  ~  {a  :  [t,  T]  — >  A  :  a  is  measurable.}  , 

N(t)  :=  {b  :  [t,T]  — »  B  :  a  is  measurable.}  . 

Define  a  strategy  for  player  I  as  the  map 

a  :  N(t)  — >  M{t) 

for  each  t  <  s  <T  and  b,b  £  B  such  that 

6(r)  =  6(r)  for  a.e.  t  <  r  <  s  =>  a[6](r)  =  a[6](r)  for  a.e.  t  <  r  <  s . 

Likewise,  define  a  strategy  for  player  II  as 

/3  :  M(t)  ->  N(t) 

for  each  t  <  s  <T  and  a,a  G  A  such  that 

a(r)  =  a(r)  for  a.e.  t  <  r  <  s  =>  /3 [a](r)  =  /3[a](r)  for  a.e.  t  <  r  <  s  . 

Now  let  r(t)  denote  the  set  of  all  strategies  for  I  and  A (t)  for  II  beginning  at  time  t.  We  are  well  equipped 
to  define  the  upper  and  lower  values  of  the  differential  game.  The  lower  value  V (x,  t )  is  defined  as 

V(x,t)  :=  inf  sup  Pt:X(a, /3[a\) 

0eA(t)  aeM(t) 

:=  inf  sup  <  /  h(s,x(s),a(s),j3[a](s))ds  +  g(x(T))>  , 

PeA(t)y£M(t)  {  Jt  j 


where  x(-)  solves  (3.1)  for  a  given  pair  of  (x,t).  Likewise,  the  upper  value  U(x,t)  is  defined  as 


U(x,t)  :=  sup  sup  Pt,x(a[b\,b) 
«er(t)  beJV(t) 


:=  sup  sup 
aer(t)  beiv(t) 


h(s,  x{s),a[b](s),b(s))  ds  +  g{x{T))  \  , 


where  x(-)  again  solves  (3.1)  for  a  given  pair  of  (x,  t). 

In  fact,  derived  from  the  dynamic  programming  optimality  conditions  in  [11],  the  lower  and  upper 
values  V  and  U  are  the  viscosity  solutions  of  a  certain  possibly  non-convex  HJ  PDE.  For  the  sake  of  this 
exposition,  we  first  define  the  following  two  Hamiltonians: 

H+(t,  x,p)  =  min  maxi  (fit,  x,  a ,  b),p)  +  hit,  x,  a,  b)}  , 

beB  agA 

H~(t ,  x,p)  =  maxnrin{(/(f,  x,  a,  b),p)  +  hit ,  x,  a ,  b )}  . 
agA  beB 
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A  very  important  case  of  this  class  of  Hamiltonian  is  when  H±(t,x,p)  are  homogeneous  of  degree  1, 
which  is  the  focus  of  this  work.  In  fact,  in  the  case  where  f(t,  x,a,b)  =  a  —  b  and  h(t,  x,  a,  b)  =  0,  it  holds 
that  H+  and  H~  coincide,  as  well  as  the  following  relationship: 

H±(t,x,p)  =  min max{ (a, p)  -  (b,p)}  =  max{(o,p}}  -  max{(6,p)}  =l\{p)  -X*B(p) . 

beB  aeA  aeA  beB 

where  I4  and  IB  are  the  indicator  functions  of  the  sets  A  and  B,  respective.  In  this  case,  H±(t,x,p) 
can  be  written  as  a  difference  of  two  positively  homogeneous  (of  degree  1)  Hamiltonians  $1,  $2,  namely, 

=  $i(p)  -  $2(p) 

where  $1  and  $2  have  their  respective  Wulff  sets  as  A  and  B  (see  [12,  15,  26]  for  more  details  of  the 
Wulff  set.) 

Now,  for  a  general  pair  of  H±(t,x,p),  we  have  the  following  well-known  theorem. 

Theorem  3.1.  [11]  The  function  U  is  the  viscosity  solution  to  the  HJ  PDE  : 

(  §-tU  +  H+(t,x,VxU)  =  0  on  x  [t,  T] , 
yJ(x,T)  =  g(x)  on  Rd  . 

Similarly,  the  function  V  is  the  viscosity  solution  to  the  HJ  PDE  : 

(§-tV  +  H~(t,x,VxV)  =  0  onKdx[f,T], 

|  V(x,T)=g(x)  onM.d . 

It  is  worth  mentioning  again  that,  in  a  general  setting  where  h  is  possibly  non-convex,  the  two  Hamilto¬ 
nians  H+  (t,  x,p)  and  H~  (t,x,p)  may  not  coincide.  But,  when  they  do,  there  is  the  following  corollary: 

Corollary  3.2.  [11]  If 

H+(t,  x,p)  =  H~(t,  x,p)  on  [t,  T]  x  x  Rd  , 


then  it  holds  that  U  =  V . 


Hereafter,  when  U  =  V,  we  write  <p(x,t)  :=  U(x,T  —  t)  =  V(x,T  —  t).  Note  that  in  general,  the 
Hamiltonians  H+  and  H~  can  be  non-convex  and/or  non-concave,  and  this  is  one  very  important  occasion 
that  non-convex  HJ  PDE  comes  in. 

We  now  solve  the  above  two  HJ  PDE  arising  from  differential  games  with  the  non-convex  techniques 
that  are  discussed  later  in  this  work  in  the  case  where  H+  and  H~  are  independent  of  (x,  t),  by  rewriting 
the  above  HJ  PDE  backward  in  time  (which  results  in  a  minus  sign  in  the  Hamiltonians  after  a  change 
of  variables)  to  get  to  an  equation  of  the  form  (2.1).  As  an  example,  we  consider  in  the  case  in  M2  where 
f(t,x,a,b)  =  (02,-61),  h(t,x,a,b)  =  0,  A  =  {a  €  R  :  [02!  <  1}  and  B  =  {b  e  R  :  |&i|  <  1}.  Following 
the  same  argument  as  in  the  previous  example,  we  obtain  that  H+  =  H~  and 

iJ±(t,a:,p)  =  minmax{a2p2  -  hpi}  =  \p2\  -  \pi  \ . 

beB  aeA 


Therefore,  applying  Theorem  3.1  and  its  corollary,  we  conclude  that  U  =  V  satisfies 


§-tU{x,t)  +  \\d2U{x,f)\[2  -  \\diU(x,t)\\2  =  0  on  K2  x  [t,T] 


[U(x,T)  =  g(x) 

Writing  (p(x,  t)  :=  U(x,T  —  t)  =  V (x,  T  —  t),  we  arrive  at 


on 


§iifi(x,t)  +  \\dpp(x,t) ||2  -  \\d2<p(x,t) ||2  =0  on  K2  x  [0,T] 


dt 

<fi(0,x)  =  g(x) 


on 
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which  is  now  an  HJ  PDE  of  the  form  (2.2)-(2.1).  We  will  discuss  more  complicated  cases  along  these 
lines  involving  high  dimensions  in  Example  4  of  Section  5. 

As  mentioned  earlier,  since  methods  and  algorithms  discussed  in  this  work  may  be  extended  to  time- 
dependent  (and  even  possibly  state-dependent  case)  provided  that  we  can  efficiently  evaluate  the  integral 
in  the  above  minimization  problems,  it  is  possible  to  apply  the  methods  proposed  here  to  more  general 
differential  games  where  H+  and  H~  depend  on  (x,t).  This  will  be  an  interesting  future  research  topic. 


4  Optimization  Methods 

As  in  [6],  we  suggest  to  calculate  the  solution  to  the  initial  value  problem  for  the  HJ  equation  with 
a  possibly  non-convex  Hamiltonian  H  but  with  convex  initial  data  J  by  solving  the  the  minimization 
problem  (2.3).  To  evaluate  a  value  of  <p  at  the  point  (x,t),  we  solve  the  d-dimensional  minimization 
problem  in  (2.3).  This  way,  all  the  values  <p  at  different  points  (x,t)  are  decoupled  and  therefore  they 
can  be  computed  in  parallel  without  any  communication.  The  minimization  problem  of  dimension  d  is 
of  a  low  complexity  (with  complexity  growing  polynomially  in  d.)  Moreover,  a  very  useful  advantage  of 
this  method  is  that  this  minimization  of  (2.3)  not  only  provides  the  value  ip(x,t),  but  also  the  limiting 
sub-differential  set  dxip,  since  we  have  that  (see  for  instance  [6]): 

dxip(x,t)  =  argmin,„6Rd { J* (v)  +  tH(v)  -  , 

where  argmin  returns  the  set  of  minimizers.  In  the  case  the  minimizer  is  unique,  we  have  a  simple  formula: 

Vxtp{x,t)  =  argmin^gRd {J*(v)  +tH(v)  -  (x,w)}  . 

However,  in  the  multi-valued  case,  numerical  instability  might  occur  in  our  non-convex  optimization  (not 
necessarily,  but  possibly) .  We  will  discuss  this  possible  numerical  instability  as  well  as  how  we  handle  the 
problem.  We  would  like  to  remark  that  when  we  are  solving  the  differential  dynamic  game  or  the  optimal 
control  problem,  the  limiting  sub-differential  set  dxip(x,t)  is  actually  extremely  useful  in  choosing  the 
parameters  in  the  admissible  sets  A  and  B  that  should  be  chosen  at  time  t  to  optimize  the  controls. 

4.1  ADMM  splitting  and  our  algorithm 

We  propose  to  minimize  (2.3)  using  the  following  ADMM/split-Bregman  [16,  34]  algorithm  in  the 
non-convex  case  for  a  given  parameter  p  >  0.  (In  numerical  examples  in  Section  5,  p  ranges  from  1  to 
10.) 

Algorithm  1 


For  n  =  1,2, ....,  do  the  following: 

Step  1: 


wk+1  G  argmin^ud  {tH(w)  +  ^\\Xk  -  vk  +  w||2}  , 


Step  2: 


vk+1  =  argmin„6Rd{j*(x)-(x,u)  +  ^j|Afc-x  +  w;fc+1||2}  , 


Step  3: 


Afc+r  =  Afc  _  vk+ 1  +  wk+ 1 
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Note  that  we  use  “G”  in  Step  1  because  the  minimizer  may  not  be  unique.  The  order  of  Steps  1  and  2  of 
our  algorithm  is  opposite  to  those  in  [6]  because  the  order  helps  avoid  any  complications  that  may  arise 
when  the  Hamiltonian  H  is  non-convex.  In  fact,  although  the  ADMM  algorithm  for  solving  a  convex 
optimization  problem,  e.g.  [13,  14],  has  been  well-studied  in  literature,  the  behaviour  of  ADMM  applied 
to  a  general  non-smooth  non-convex  optimization  problem  has  not  been  fully  understood,  and  remains 
an  area  of  active  research.  ADMM  may  generally  fail  to  converge  to  a  global  minimizer  due  to  non¬ 
convexity  but,  for  some  practical  nonconvex  problems,  e.g.  in  matrix  completion  [28,  29,  36,  37,  39]  and 
phase  retrieval  [35],  it  was  noticed  that  ADMM  works  very  well.  Therefore,  some  effort  has  been  made  to 
understand  the  behaviour  of  ADMM  in  the  non-convex  case.  In  particular,  convergence  of  ADMM  has 
now  been  proved  in  [16,  20,  32,  33,  34]  for  either  (1)  a  convex  (possibly  non-smooth)  functional  in  Step  2,  or 
(2)  a  non-convex  functional  with  some  additional  regularity  assumptions  (e.g.  strict  proximal  regularity, 
piecewise  affine,  etc.)  in  Step  2.  Readers  may  refer  to  the  aforementioned  papers  for  a  discussion  of  the 
convergence  of  ADMM  in  the  non-convex  case. 

Other  optimization  algorithms  can  also  be  used  to  minimize  (2.3),  e.g.  the  Chambolle-Pock  algorithm 
[5],  and  primal-dual  splitting  or  other  forms  of  operator-splitting/ADMM  schemes  [8]. 

4.2  Evaluation  of  proximal  mappings 

The  ADMM  splitting  introduced  in  the  previous  subsection  computes  proximal  maps  [23]: 

( I  +  df)~X{x)  :=  argminygRd  j/(y)  +  *  ||a;  -  ,  (4.1) 

in  Steps  1  and  2,  and  for  many  functions  /,  they  can  be  calculated  rapidly. 

4.2.1  Shrink  operators 

We  review  the  following  shrink  operators  for  proximal  maps  of  positively  homogeneous  of  degree  1 
convex  functions  $.  In  fact,  they  can  be  characterized  by  the  projection  of  a  point  x  to  a  closed  convex 
set  as  discussed  in  [6,  7].  Consider  the  convex  set 

K  =  $-1([0, 1])  :={£  :  $(*)  G  [0,1]}. 

The  given  functional  4>  actually  coincides  with  the  Minkowski  function  (a.k.a.  Minkowski  gauge)  of  K: 

<f>(:r)  =  pk(x )  :=  inf{r  >  0  :  x  G  tK}  . 

We  now  consider  the  gauge  dual  of  $,  <f>°  [12,  15],  which  can  be  given  as 

<&°(y)  =  max  (x,  y) . 

x£K 

The  Wulff  set  W  (see  for  instance  [26])  is  given  as 

W  =  (4>°)-1([0, 1]) 

and  therefore  the  gauge  dual  <f>°  can  be  given  as  the  Minkowski  functional  of  W, 

$°(y)  =  Pw{y )  =  inf{r  >  0  :  y  G  tW}  . 

By  either  the  Fenchel- Legendre  duality  [27]  or  the  gauge  duality  [12,  15],  we  obtain  that 

4>*  (■ y )  =  Iw  and  $(x)  =  max  (x,  y) , 

xew 

where  Iw  is  the  indicator  function  of  the  set  W.  By  the  Moreau  decomposition  [23]  of  a  closed  proper 
convex  function  /,  we  have,  for  any  a  >  0, 

(I  +  ad  f)~l(x)  +  a(I  +  a~1df)~1(x/a)  =  x  ,  (4.2) 
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which  gives  us 


(/  +  ai 9$)  1(x)  =  x  -  ProjaW(x) . 


(4.3) 


We  dehne  the  shrink  operator  of  a  general  positively  homogeneous  of  degree  1  convex  functional  $  as 
shrink$(x,a)  :=  (I  +  a0$)_1(x)  =  argminyeR<i  ja$(j/)  +  ^\\x  -  y\\ ||  =x-  Pio}aW(x) 


for  any  a  >  0.  In  evaluating  the  shrink  operator,  one  only  needs  to  compute  the  projection  map  of  a 
point  to  a  closed  convex  set.  Following  [6],  the  projection  to  a  closed  convex  set  K  for  a  point  x  outside 
K  can  be  given  by 


■o  ■  ,  n  -  VV>(x,s) 

ProJ k{x)  =X-  s- 


||V'0(x,s)||2  ’ 

where  ip  is  a  function  given  as  a  solution  to  (again)  the  following  Hamilton- Jacobi  equation: 


0,^+11^112  =  0  and  ip(-,  0)  =  J(-) 


(4.4) 


with  the  initial  value  J(x)  being  given  a  twice  differentiable  function  such  that  {J{x)  =  0}  =  dK  with 
J(x)  <0  inside  K ,  J{x)>  0  outside  K ,  and  s  is  the  value  such  that  ip(x,  s)  =  0.  The  function  satisfies  the 
property  ip(x,d(x,dK))  =  0  where  d(x,dW )  is  the  signed  distance  function  of  x  from  dK.  The  function 
ip  can  be  viewed  as  a  special  case  of  (2.1),  and  therefore  (2.3)  provides  an  explicit  formula  for  ip.  The 
minimization  process  of  (2.3)  can  then  be  given  again  by  an  ADMM  process  introduced  in  the  previous 
sub-section,  where  only  the  shrink2  operator,  which  will  be  described  clearly  below  in  this  section,  is 
used.  The  proximal  map  related  to  J  can  be  done  explicitly  by  Newton’s  method.  Now,  for  a  given  x 
the  equation  ip(x,  s)  =  0  can  be  solved  by  Newton’s  method  if  ip  is  differentiable: 


ip(x,sn )  =sn+  <P(x,sn) 
dsip(x,sn)  S  \\Vip(x,  s")||2 


(4.5) 


Therefore  the  shrink  operator  shrinks  can  be  evaluated  in  a  very  computationally  efficient  manner. 

In  particular,  a  special  family  of  shrink  operator  attracts  particular  attention,  which  is  the  set  of 
shrink  operators  of  the  p  norm  with  p  (E  [l,oo).  From  the  well-known  fact  that  the  Wulff  set  of  the 
p-norm  is  the  g-norrn  ball,  where  1/p  +  1/q  =  1,  e.g.  [26],  we  get  that 


shrinkp(x, a)(x)  :=  shrink|M|p(x,a)  =  x-  Proj^^^x) . 

Some  of  them  are  well-known  shrink  operators  for  some  p  as  described  below,  for  any  *  =  1, ... ,d :: 


[shrinki(x,  a)]* 
shrink2(x,  a) 


sgn(x,;)  max{|xj|  -  a,  0}  , 

JlRI  max(IMI2  -  a,°}  if  x  ^0, 

|  0  if  x  =  0 


where  we  adopt  the  convention  0/0  =  0.  Note  that  shrinki(x,  a)  is  often  used  in  compressed  sensing 
algorithms,  e.g.  in  [38].  Since  the  shrink  operators  shrinkp(x,  a)(x)  and  shrinkg(x,  a)(x)  are  related  to 
each  other  by  (4.2)  and  (4.3),  we  can  switch  to  the  shrink  operator  that  is  easier  to  evaluate.  Therefore 
to  evaluate  shrinkp(x,  a),  one  might  only  need  to  evaluate  either  the  projection  to  a  ||  -  ||p  closed  ball 
or  a  ||  •  | ] q  closed  ball.  As  described  in  [6],  the  projection  to  a  ||  •  ||p  closed  ball  can  be  solved  by  the 
aforementioned  process  by  using  the  HJ  PDE  (4.4)  with  initial  value  J(x)  =  5(||x||pm  —  a2m),  where 
m>2if2<p<oo  and  1/2  <  m  <  1  if  1  <  p  <  2.  The  power  m  is  chosen  such  that  either  J  or  J*  is  a 
twice  differentiable  function. 


4.2.2  Stretch  operators 


For  non-convex  HJ  PDE,  a  splitting  method  for  solving  (2.3)  gives  rise  to  proximal  maps  of  non- 
convex  Hamiltonians.  In  this  subsection,  we  focus  on  the  proximal  maps  of  —  <f>,  where  $  is  a  positively 
homogeneous  (of  degree  1)  convex  function  as  mentioned  in  the  previous  subsection. 

For  the  sake  of  notational  convenience,  before  we  discuss  the  stretch  operators,  let  us  define,  for  a 
given  non-empty  compact  set  C,  the  furthest  points  of  a  point  x  to  C  as  the  following  set-valued  function 

Fui'c(x)  :=  argmaxygC{||x  -  y ||2}  ,  (4.6) 

where  argmax  returns  the  set  of  maximizers.  Note  that  Furc(a;)  is  a  convex  set,  even  if  C  is  not 
convex,  though  we  will  not  use  this  property.  For  any  given  smooth  monotone  decreasing  function 
V  :  [0, oo )  ->EU  {-t~oo},  we  have 

argmaxygC{||x-  y||2}  =  arginin^  j  ^V(\\x-  y\\2) 


Therefore  Fui'c(x)  can  also  be  interpreted  as  the  argument  y  attaining  the  minimum  of  the  “potential” 
V(||  •  ||2)  between  a  point  x  and  the  compact  convex  C.  This  interpretation  of  the  map  Furc(x)  leads 
us  to  a  very  efficient  algorithm  for  evaluating  the  function  values,  which  will  be  further  elaborated  in 
the  end  of  this  subsection.  We  would  like  to  remark  that  the  above  equality  holds  for  all  such  monotone 
functions  V,  and  is  independent  of  the  function  V  chosen. 

In  order  to  have  a  better  understanding  of  the  function  Furc,  we  shall  plot  in  Figure  1  the  set  Furc 
for  some  specific  values  of  x  and  choices  of  C.  For  better  illustrative  purpose,  contour  curves  of  the 
function  \\x  —  ( •  )||  and  boundary  of  the  set  C  are  also  plotted  for  references.  The  point  x  is  marked  as 
black  stars  and  the  set  Furc(x)  is  marked  as  red  stars  in  each  case.  We  include  cases  for  both  single  and 
multi-valued  Furc(x). 
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Figure  1:  Plot  of  x,  the  set  Furc(*),  contour  curves  of  the  function  \\x  —  -||  and  boundary  of  the  set  C.  Top-left: 
C  =  {x  :  ( x,Ax }  <  1}  where  A  =  diag(l,  25/4),  x  =  (1.5,  0.5);  top-right:  same  C  as  top-left,  x  =  (1.5,0). 
Bottom-left:  C  is  a  regular  3-gon,  x  =  (0,-1);  Bottom-right:  C  is  a  regular  8-gon,  x  =  (0,0).  The  point  x  is 
marked  as  black  stars  and  the  set  Furc(a;)  is  marked  as  red  stars  in  each  case. 


Now  with  the  definition  of  the  map  Furc(x),  we  are  ready  to  evaluate  the  “proximal  map”  of  the 
non-convex  functional  —  a<F,  which,  by  definition,  is  given  by: 

(/  -  :=  argminyeRd  |-a$(y)  +  \\\x~  2/||l  j-  , 

where  argmin  returns  the  set  of  minimizers.  Now,  suppose  the  Wulff  set  of  <f>  is  given  by  W,  i.e., 


®(y)  =  max  (y,  c) . 
c£W 

In  fact,  by  a  direct  substitution  of  the  above  expression  of  $  into  the  definition  of  the  ‘proximal  map’, 
we  can  directly  get  that  the  set  [(/  —  a<9<b)_1(:z:)]  is  actually  the  y-projection  of  the  following  set-valued 
function  for  any  a  >  0, 

argminyeRd)CeaW  j^llz/—  (a;  +  c)|||  -  ^||a:  -+-  c||| 

By  directly  evaluating  the  above  minimization  problem,  we  arrive  at 

(I  —  ad^>)~1(x)  =  x  +  c( x) , 
where  c(x)  is  the  set-valued  function  given  by 

c(x)  :=  argminceaW  |-^||ar  +  c||2 

With  the  definition  of  the  Fui'c  operator  in  (4.6)  for  any  compact  set  C ,  we  can  rewrite  the  above 
expression  of  ‘proximal  map’  by  the  following  formula: 

(J  —  a9$)_1(a;)  =  x  +  FuraW(— x) .  (4.7) 

Since  this  proximal  map  is  usually  multi-valued  at  the  origin  (e.g.  when  W  is  balanced,  i.e.  W  =  —W), 
this  map  may  introduce  numerical  instability,  because  it  is  not  clear  which  maximum  to  choose. 

In  order  to  solve  the  aforementioned  problem,  inspired  by  the  definitions  of  the  shrink  operators, 
we  introduce  the  following  stretch  operators  for  proximal  maps  of  a  positively  homogeneous  of  degree  1 
convex  functional  $: 

stretchy (x,  a)  :=  x  +  /  w  ^FurQW(-x)(^) 

J  FurQ,vr(— x) 
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where  He  is  the  Hausdoff  measure  of  the  set  C.  Notice  that,  according  to  our  new  definition,  the  stretch 
operators  are  NOT  the  ‘proximal’  maps  as  stated  in  (4.7);  rather,  they  are  the  weighted  average  of  the 
‘proximal  map’  in  (4.7).  However,  if  the  cardinality  of  [(/  —  a<9<I>)_1(0)]  is  1,  there  is  a  unique  element 
o{Rd  which  attains  the  maximal  value  of  (4.6),  and  we  get  back  the  situation  of  a  single- valued  function, 
and  therefore 


(/  —  cu9<h)  :(a:)  =  stretchy  (a;,  a) . 

Three  other  examples  are  given  as  comparison  between  the  stretch  operator  and  the  proximal  map.  In 
particular,  with  this  definition,  we  notice  that  if  W  is  balanced,  i.e.  W  =  —W,  then  for  any  a  >  0,  we 
have 


[(/  —  a<9$)  x(0)]  =  argmaxcgaH/  { || c|| | }  and  stretchy (0,  a)  =  0  . 

Whereas,  if  [(/  —  a<9<l>)-1(0)]  =  {u*  :  1  <  i  <  n}  is  a  finite  set  of  points,  then 

n 

stretchy  (a;,  a)  =  Vi . 

i- 1 

Moreover,  we  notice  that  if  [(/  —  a<9<I>)_1(0)]  =  7  is  a  curve,  then 


stretchy  (a;,  a) 


v  da 


where  a  is  the  surface  measure  of  7. 

The  use  of  the  stretch  operator  (especially  the  averaging  of  the  set  in  the  definition)  is  to  provide 
numerical  stability,  which  is  reasonable  in  the  middle  of  an  ADMM  procedure.  This  is  because  the 
ADMM  iteration  will  continue  and  the  next  iterate  will  be  likely  to  leave  the  pathological  region  (i.e. 
the  set  of  points  where  the  function  is  multi-valued),  and  thereby  get  to  a  unique  minimum  in  the  next 
iteration  (considering  the  fact  that  the  pathological  region  is  usually  of  at  most  co-dimension  1). 

Nonetheless  it  will  not  be  appropriate  if  the  last  iteration  arrives  at  the  multi-valued  point  (i.e. 
where  the  pathological  behaviour  arises),  because  the  stretch  operator  does  not  output  any  minimum 
argument  in  the  set.  On  the  PDE  side,  these  locations  are  exactly  when  ‘the  gradient  of  the  solution  has 
a  discontinuity’  (to  be  more  exact,  when  the  solution  is  not  differentiable.)  Since  our  method  relies  on 
a  exact  representation  formula,  we  may  avoid  that  problem  by  moving  the  pathological  grid  point  to  a 
neighboring  point  (xe,t)  in  the  £  neighborhood  of  ( x,t )  where  the  pathological  situation  does  not  occur, 
e.g.  xe  =  x  +  sv  where  ||i7||2  <  1,  with  an  e  small  enough.  This  is  again  possible  considering  the  fact  that 
the  pathological  set  is  usually  of  co-dimension  at  most  1. 

In  order  to  have  an  efficient  implementation  of  Algorithm  1,  it  is  necessary  for  the  stretch  operators 
to  be  evaluated  in  high  speed.  In  fact,  some  stretch  opeators  can  be  easily  evaluated  and  given  explicitly 
below,  for  any  i  =  1, ...,  d: 


[stretchi(x,  a)],; 
stretchy  (2:,  a) 


sgn(ajf)(|a:*|  +  a) 
fy^OMI2  +  a)  lfx^O 
|  0  if  x  =  0 


where  we  now  write,  for  1  <  p  <  00, 


stretchy (x,  a) (x)  :=  stretchy. y  (a;,  a) . 

We  remark  that  we  have  adopted  the  convension  sgn(O)  =  0.  We  note  the  amusing  fact  that  stretchi  and 
stretcli2  are  monotone  operators  applied  to  x.  Figure  2  shows  the  vector  fields  representing  the  differences 
stretchp(a;,  a)  —  x  on  [—1,  l]2  for  p=  1,2  and  with  a  =  0.05. 
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Figure  2:  Vector  fields  representing  stretchp(a:,  a) 
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x  on  [—1,  l]2  with  a  =  0.05;  left:  p  =  1;  right:  p  =  2. 


For  a  more  general  compact  convex  set  C,  in  evaluating  the  stretch  operator,  one  needs  to  efficiently 
compute  the  furthest  point  Furc(x)  of  a  point  to  C: 


Furc(a:)  =  argmin,yeC 


\v{\\x-y\\ 2)j  , 


(4.8) 


where  V  :  [0,  oo)  -iRU  {+oo}  is  a  smooth  monotone  decreasing  function.  We  again  propose  to  minimize 
(4.8),  for  a  fixed  point  x  and  compact  convex  set  C,  by  a  ADMM/split-Bregman  algorithm  in  the  non- 
convex  case.  In  order  to  have  the  algorithm  implemented  efficiently,  we  shall  focus  on  the  evaluation  of 
the  following  specific  proximal  map  for  a  given  v  and  a  >  0: 

(/  +  ^dV{\\  •  ||2))  (v)  :=  argminpgRd  j)V(||p||2)  +  i||p-'r||2j 


where  9V(||  •  ||2)  is  the  limiting  sub-differential  of  V(||  •  ||2).  In  what  follows,  we  consider  only  the  case 
when  the  function  V  is  V(r)  :=  —  log(r).  The  computation  with  a  general  V  is  similar.  In  fact,  a  direct 

computation  shows  that  p  £  (/  +  f  cW(||  •  ||2))  1  (v)  satisfies  the  following  equation: 


(i-albll  2)p  =  v, 

which  in  turn  provides  us  with  the  following  explicit  expression: 

i+Vi+^IHI-^  if^0] 

9(^(0))  if  w  =  0 , 

where  W(||  •  ||2)  is  the  limiting  sub-differential  of  V(||  •  ||2)  and  dC  denotes  the  boundary  of  the  set  C. 
Again,  for  reasons  discussed  earlier  in  this  subsection,  when  v  =  0,  we  replace  the  multi-valued  proximal 
map  by  a  weighted  average  of  the  set,  which  is  zero  in  this  special  case. 

Now  we  are  ready  to  minimize  (4.8),  for  a  fixed  point  x  and  compact  convex  set  C,  using  the  following 
ADMM/split-Bregman  [16,  20,  32,  33,  34]  algorithm  in  the  non-convex  case  for  a  given  parameter  a  >  0: 


(^+f5F(|M|2))_1(V) 


Algorithm  2 


For  n  =  1,  2, we  compute  the  following: 


Step  1: 


pk+1  = 


/1+Vl+4g-1H^-!A±^h!^  (gfc  _vk+x)  +  x  i{qk_n 


,k 
k 


if  qk  -  rf 


x^0, 

x  =  0 , 
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Step  2: 


qk+1  =Projc(?7fe+/+1), 


Step  3: 


„fc+l  fc 
Tj  =  Tj 


fc+1 


fc+1 


Again  we  intentionally  arrange  the  ADMM  such  that  a  convex  functional  is  minimized  in  a  Step  2. 
Readers  may  refer  to  section  4.1  for  a  discussion  of  the  advantage  for  this  arrangement  as  well  as  the 
convergence  of  the  non-convex  ADMM.  The  projection  of  an  abitrary  compact  convex  C  is  discussed 
in  subsection  4.2.1.  With  the  above  algorithm,  together  with  very  effective  techinque  to  compute  the 
projection  operator,  we  believe  that  the  stretch  operator  for  a  large  class  $  can  be  computed.  For 
instance,  if  we  wish  to  evaluate  stretchy  (a;,  a)(x)  for  1  <  p  <  oo,  we  propose  to  rely  on  the  formula  (4.7), 
where  the  map  Fur^^Hi,  with  q  =  p/(p  —  1)  and  a  >  0  can  be  computed  by  Algorithm  2,  and  Step 
2  in  the  algorithm  can  be  evaluated  as  discussed  in  subsection  4.2.1,  i.e.  by  obtaining  the  solution  to 
(4.10)  with  initial  function  J(x)  =  |(||a;||gm  —  a2m)  where  m>2if2<q<oo  and  1/2  <  m  <  1  if 

1  <  q  <  2.  Figure  3  shows  the  set  Furc  for  some  specific  values  of  x  and  choices  of  C  using  Algorithm 

2  with  p  =  1  and  r/0:q0,p0  =  0.  For  illustrative  purposes,  contour  curves  of  the  function  ||x  —  ( •  )||  and 
boundary  of  the  set  C  are  also  plotted  for  references.  The  point  x  is  marked  as  black  stars  and  the  set 
Furp(a;)  is  marked  as  red  stars  in  each  case.  However,  we  notice  that  we  sometimes  encounter  difficulties 
with  computing  stretchy  (a;,  a)  (a;)  using  Algorithm  2  for  some  particular  choices  of  p  and  x,  and  in 
these  cases,  the  runtime  is  very  slow.  A  more  efficient  algorithm  to  evaluate  the  stretch  operators  in  full 
generality  will  be  a  very  interesting  and  important  topic  and  will  be  subjected  to  future  research. 
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Figure  3:  Plot  of  x,  the  set  Furc(a;)  computed  by  Algorithm  2  with  a  =  5,  contour  curves  of  the  function 
||a;  —  -||  and  boundary  of  the  set  C.  Top:  C  is  3/2-norm  ball  of  radius  0.5;  bottom:  C  is  5/2-norm  ball  of  radius 
0.5;  left:  x  =  (1.1, 1.2);  right:  x  =  (1.1, 0.1).  The  point  x  is  marked  as  black  stars  and  the  set  Furc(a:)  is  marked 
as  red  stars  in  each  case. 

Algorithm  2  which  calculates  the  Fur<y(j:)  can  now  be  used  to  compute  the  stretch  operators,  e.g. 
stretchp(a;,  a)(x)  when  p  /  1,2,  oo  and  therefore  these  stretch  operators  have  no  closed  form.  Figure  4 
shows  the  stretch  operator  stretchp(a;,  a)(x)  with  a  =  0.05  and  p  =  3/2.  In  here,  we  use  the  following 
method  to  modify  a  in  Algorithm  2  as  the  iteration  goes:  we  take  the  new  ct+  =  4<r  in  case  the  ADMM 
does  not  converge  (i.e.  it  does  not  produce  sequence  that  satisfies  the  error  being  bound  by  e)  and  then 
reinitialize  the  algorithm.  The  initial  a  is  chosen  as  a  =  2.  The  choice  of  p  in  the  ADMM  splitting  for 
the  projection  operator  in  Step  2  of  Algorithm  2  is  also  done  with  this  same  method. 
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Figure  4:  Vector  fields  representing  stretchp(a:,  a)  —  x  on  [—1,  l]2  with  a  =  0.05  and  p  =  3/2. 


4.2.3  Projection  of  a  point  from  the  inside  of  a  convex  set 

Closely  related  to  stretch  and  Furc  operators,  the  projection  map  of  a  point  a;  to  a  closed  (non-convex) 
set  C  is  defined  as  the  following  set-valued  function 


Projc(a;)  :=  argminpeC,{||a:  -  y ||2}  . 


(4.9) 


Now,  we  consider  a  special  case  when  C  :=  dK ,  where  K  is  a  closed  convex  set.  In  general,  the  projection 
to  the  convex  set  dK  from  the  inside  of  a  convex  set  can  be  given  by 

-rj  •  /  \  -  Vlp(x,S) 

ProJa/fOc  =  x-s  t 


l|V^(x,s)||2  ’ 

where  ip  is  a  function  given  as  a  solution  to  (again)  the  following  Hamilton-Jacobi  equation 


dsip  —  ||  V^||2  =  0  and  ip(-,  0)  =  J(-)  (4-10) 

where  the  initial  value  J(x)  is  given  a  twice  differentiable  function  such  that  { J(x )  =  0}  =  dK,  and  s 
is  the  value  such  that  i/)(x,s )  =  0.  This  function  ip,  again,  can  be  viewed  as  a  special  case  of  (2.1),  and 
therefore  (2.3)  provides  a  representation  of  ip.  The  solution  ip  shall  now  be  calculated  with  the  same 
strategy  that  is  described  in  the  previous  section.  The  value  of  s  can  now  be  solved  from  the  equation 
ip( x,  s)  =  0  by  a  Newton’s  method  similar  to  (4.5)  when  ip  is  differentiable: 

,n+l  _,n  *P(x,Sn)  lP(x,Sn) 

'  dsip(x,sn)  '  \\Vip(x,s")\\2-  > 

The  two  formulae  (4.5)  and  (4.11)  are  the  same  except  that  it  has  a  flip  of  the  sign,  which  comes  from 
the  fact  that  (4.4)  and  (4.10)  also  have  the  same  flip  of  the  sign. 
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For  the  projection  to  the  boundary  of  a  p  norm  ball  from  the  inside  of  the  ball,  the  calculation  process 
is  again  routine  using  a  projection  operator,  which  can  be  explicitly  calculated  by  obtaining  the  solution 
to  (4.10)  with  initial  function  J(x)  =  —  a2m),  where  a  >  0,and  m>2if2<p<oo  and 

1/2  <m<lifl<p<2.  Again,  recall  that  details  of  this  procedure  are  described  in  subsection  4.2.1. 

4.2.4  Other  proximal  maps 

The  following  remark  is  useful.  There  are  some  other  proximal  maps  that  are  known  explicitly,  e,g..  for 
quadratic  functions  /  =  \{Ax,  x)  +  (b ,  x)+c,  the  proximal  map  is  given  by  (I+df)^1(x)  =  ( I+A)~1(x—b ) 
when  —1  is  not  an  eignvalue  of  A.  Also,  it  is  handy  to  note  that  the  logarithmic  barrier  functions 

f(xi )  =  —  ^”=i  log(x$)  has  its  proximal  map  given  by  [(/  +  df)  1(x)].  =  x'+V^i+4_.  Some  other 
proximal  maps  of  diffferentiable  functions  can  be  explicitly  computed  by  Newton’s  Method.  Using  the 
well-known  Moreau  decomposition  [23],  we  have  a  large  dictionary  of  fast  methods  for  calculating  the 
proximal  gradients,  thereby  simplifying  and  accelerating  the  optimization  procedure. 


5  Numerical  Experiments 

In  this  section,  we  shall  apply  our  newly  proposed  numerical  algorithm  to  compute  the  viscosity 
solution  to  an  HJ  PDE  with  a  non-convex  Hamiltonian.  For  a  given  set  of  points  ( x ,  t)  ,  we  use  Algorithm 
1  to  compute  (2.3).  We  evaluate  (x,t)  in  a  given  set  of  grid  points  over  [— 3,3]2  x  {0}d-2,  i.e.  the  2 
dimensional  cross-section.  We  choose  our  error  tolerance  in  the  ADMM  iteration  as  S  =  0.5  x  10~8, 
which  acts  as  our  stopping  criterion.  The  penalty  parameter  p  in  Algorithm  1  is  chosen  specifically 
for  different  examples.  In  all  our  examples,  we  set  all  the  initial  values  ie°,  f0,  A°  as  0.  As  previously 
mentioned,  for  iterations  where  the  minimizer  is  multi-valued,  i.e.  where  the  pathological  behaviour 
arises,  the  minimization  problem  is  solved  at  a  neighboring  point  (xe,t)  in  an  e- neighborhood  of  (x,t), 
e.g.  xe  =  x  +  ev  where  ||u||2  <  1.  In  our  numerical  experiments,  we  always  let  e  =  0.5  x  10~8.  Our 
algorithm  is  implemented  in  C++  on  an  1.7  GHz  Intel  Core  i7-4650U  CPU.  Linear  algebra  packages 
BLAS  [40]  and  LAPACK  [41]  are  used  to  perform  matrix  inversions. 

Example  1  In  this  example,  we  consider  the  2-norm  distance  function  from  the  boundary  of  a  convex 
set  to  a  point  inside  the  set  itself.  We  notice  that  the  distance  function  <p  satisfies  (2.1)  where  the 
Hamiltonian  H(p)  =  —  ||p||2  is  now  non-convex.  We  consider  the  convex  set  as  an  ellipse  enclosed  by  the 
equation  (x,Ax)  =  1  where  A  =  diag(l,  25/4, 1, 1....).  Therefore  we  choose  our  initial  condition  for  the 
HJ  PDE  as  J(x)  =  |((x,  Ax)  —  1).  In  this  example,  we  choose  p  =  1  in  Algorithm  1.  Figure  5  shows 
the  contours  of  the  objective  function  (2.3)  in  this  example  when  x  =  (0.2, 0.5),  t  =  0.7  and  d  =  2  with 
local  minima  marked  as  black  stars.  From  the  figure,  we  can  infer  that,  in  general,  it  is  not  very  easy 
for  Algorithm  1  to  be  trapped  in  a  stationary  point  other  than  the  global  minimum.  Figure  6  shows 
the  zero  set  contours  of  the  solutions  (p(x,  t)  =  0  along  the  2  dimensional  cross-section  computed  using 
our  new  algorithm  where  t  =  0.1,  0.2. ..0.9  in  dimensions  d  =  2  and  128  respectively.  Table  1  shows  the 
computational  time  per  point  in  dimensions  d  =  2n  where  n  ranges  from  1  to  12,  over  the  grid  points  ( x ,  t) 
where  x  G  {(-3  +  OAp,  -3  +  O.lq,  0, ...,  0)  :  p,  q  =  0, ...,  60}  C  [-3, 3]2  x  {0}d-2  and  t  G  {0.1, 0.2... 0.9}. 
We  can  see  that  the  computational  effort  of  the  viscosity  solution  grows  very  slowly  w.r.t.  dimension 
(and  it  seems  to  grow  almost  linearly.) 
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Figure  5:  Contour  curves  of  objective  function  (2.3)  in  Example  1  with  x  =  (0.2, 0.5)  and  t  =  0.7  when  d  =  2. 
Local  minima  are  marked  as  black  stars. 


Figure  6:  Distance  functions  of  ellipses  in  its  interior,  i.e.  zero  set  contours  of  ip(x,  t)  =  0  with  t  =  0.1,  0.2, ...,  0.9 
(from  red  to  blue);  left:  in  2  dimenions;  right:  in  128  dimensions. 


Dimension 

Average  computational  time  per  point 

d  =  2 

6.7092e-07  (sec) 

d  =  4 

8.0650e-07  (sec) 

d  =  8 

1.1528e-06  (sec) 

d  =  16 

1.4601e-06  (sec) 

d  =  32 

2.1011e-06  (sec) 

CO 

II 

^3 

3.1948e-06  (sec) 

d  =  128 

5.0498e-06  (sec) 

d  =  256 

1.0219e-05  (sec) 

d  =  512 

2.0365e-05  (sec) 

d  =  1024 

3.6164e-05  (sec) 

d  =  2048 

7.1937e-05  (sec) 

d  =  4096 

1.3929e-04  (sec) 

Table  1:  Computational  time  per  each  point  for  viscosity  solution  over  the  grid  points  ( x,t )  where  x  £  {(—3  + 
O.lp,  -3  +  0.1  q,  0, ...,  0)  :  p,  q  =  0, ...,  60}  C  [-3, 3]2  x  {0}d_2  and  t  £  {0.1,  0.2... 0.9}. 

Example  2  Now  we  consider  the  1-norm  distance  function  from  the  boundary  of  a  convex  set  to  a  point 
inside  the  set.  The  distance  function  ip  now  satisfies  (2.1)  with  a  non-convex  Hamiltonian  H(p)  =  —  ||p||i. 
We  again  consider  the  convex  set  enclosed  by  the  level  curve  (x,  Ax)  =  1  where  A  =  diag(l,  25/4, 1, 1....), 
i.e.  we  set  the  initial  condition  for  the  HJ  PDE  as  J(x)  =  \{{x,  Ax)  —  1).  We  now  use  p  =  10  in  Algorithm 
1.  Figure  7  shows  the  contours  of  the  objective  function  (2.3)  in  this  example  when  x  =  (0.2,  0.5), 


16 


t  =  0.7  and  d  =  2  with  local  minima  marked  as  black  stars.  Now  the  objective  function  is  much 
more  non-convex  and  it  has  4  local  minima  which  are  close  to  each  other.  Figure  8  provides  the  zero 
set  contours  of  <p(x,t)  =  0  along  the  2  dimensional  cross-section  computed  with  our  new  algorithm 
where  t  =  0.1,  0.2. ..0.8  in  dimensions  d  =  2  and  1024  respectively.  Table  2  shows  the  computational 
time  per  point  in  dimensions  d  =  2n  where  n  ranges  from  1  to  12,  over  the  grid  points  (x,t)  where 
x  €  {(—3  +  0.1  p,  —3  +  0.1  q,  0, ...,  0)  :  p,  q  =  0,  ...,60}  C  [—3,  3]2  x  {0}d^2  and  t  £  {0.1, 0.2. ..0.8}.  We  can 
again  see  that  the  computational  effort  of  the  viscosity  solution  is  minimal  and  grows  very  slowly  with 
respect  to  dimension. 


Figure  7:  Contour  curves  of  objective  function  (2.3)  in  Example  2  with  x  =  (0.2, 0.5)  and  t  =  0.7  when  d  =  2. 
Local  minima  are  marked  as  black  stars. 

t  =  0.8  t  =  0.8 


Figure  8:  Distance  functions  of  ellipses  in  its  interior,  i.e.  zero  set  contours  of  ip(x,  t)  =  0  with  t  =  0.1,  0.2, ...,  0.8 
(from  red  to  blue);  left:  in  2  dimenions;  right:  in  1024  dimensions. 


Dimension 

Average  computational  time  per  point 

d  =  2 

8.6898e-07  (sec) 

d  =  4 

9.9677e-07  (sec) 

d  =  8 

8.8148e-07  (sec) 

d  =  16 

1.1595e-06  (sec) 

II 

CO 

to 

1.9954e-06  (sec) 

CO 

II 

^3 

3.0177e-06  (sec) 

d=  128 

5.0603e-06  (sec) 

d  =  256 

9.8860e-06  (sec) 

d  =  512 

2.0628e-05  (sec) 

d  =  1024 

3.8714e-05  (sec) 

d  =  2048 

7.3550e-05  (sec) 

d  =  4096 

1.4662e-04  (sec) 
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Table  2:  Computational  time  per  each  point  for  viscosity  solution  over  the  grid  points  ( x,t )  where  x  £  {(—3  + 
O.lp,  -3  +  O.lg, 0, ...,  0)  :  p,  q  =  0, 60}  C  [-3, 3]2  x  {0}d~2  and  t  £  {0.1,  0.2... 0.8}. 

Example  3  In  this  example,  we  compute  the  2-norm  projection  to  the  boundary  of  a  convex  set  from  a 
point  inside  the  set  itself.  Our  procedure  is  the  same  as  described  in  Subsecton  4.2.  During  our  evaluation 
of  the  projection,  we  shall  calculate  the  same  distance  function  from  the  boundary  of  a  convex  set  inside 
the  set  itself,  with  the  same  setting  as  in  our  previous  example.  In  this  example,  we  choose  p  =  5  in 
Algorithm  1.  We  also  set  the  error  tolerance  of  the  level  set  function  i/j(x,s)  as  \ijj(x,s)\  <  0.5  x  10~8. 
We  always  set  the  initial  guess  of  the  distance  as  s  =  0. 

We  first  consider  the  convex  set  again  as  an  ellipse  enclosed  by  the  equation  (x,  Ax)  =  1,  x  £  where 
A  =  diag(l,25/4, 1, 1....),  and  our  initial  value  is  chosen  again  as  J(x)  =  \({x,Ax)  —  1).  The  objective 
function  (2.3)  in  here  is  the  same  as  that  in  Example  1  for  a  given  x  and  t.  Figure  9  shows  the  projection 
of  the  points  when  d  =  2.  Table  3  shows  the  average  computational  effort  for  computing  the  projection 
to  the  boundary  of  the  ellipse  from  10000  realizations  of  a  random  point  p  =  (1, 0.5)  x  (where  dt 

are  some  random  numbers  in  (0, 1))  using  Newton’s  Method  in  dimensions  d  =  2”,  where  n  ranges  from 
1  to  7.  We  can  see  that  computational  effort  is  minimal  even  when  d  =  128. 


Figure  9:  Projections  of  the  points  p  where;  top-left:  p  =  (1,  0.1);  top-right:  p  =  (1,  0.5);  bottom:  p  =  (1,  2). 


Dimension 

Average  Number  of  Newton  Steps 

Average  Time  per  (Outer)  Iteration 

d  =  2 

26.0000 

1.5506e-05  (sec) 

d  =  4 

28.9191 

2.0678e-05  (sec) 

d  =  8 

32.4473 

3.4144e-05  (sec) 

d  =  16 

33.5961 

5.9824e-05  (sec) 

d  =  32 

34.9619 

1.1218e-04  (sec) 

II 

36.0000 

2.2064e-04  (sec) 

d  =  128 

37.0000 

4.3750e-04  (sec) 
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Table  3:  Average  computational  effort  to  evaluate  the  projection  of  10000  realizations  of  a  given  point  p  = 
(1,  0.5)  x  {di}1*- 2  (where  di  are  some  random  numbers  in  (0, 1))  to  the  boundary  of  the  ellipse. 


Next,  we  consider  a  more  interesting  case,  when  the  convex  set  is  set  as  {||x||p  <  1},  where  we  choose 
p  =  3/2  and  3.  We  choose  initial  value  as  J(x)  =  |(||a:||pm  —  1),  where  nn  =  3/4  if  p  =  3/2  and  m  =  2 
if  p  =  3.  As  described  in  section  4.2  we  use  a  Newton  method  as  an  inner  iteration  to  evaluate  the 
proximal  map  of  J(x).  Figure  10  give  the  contours  of  the  objective  functions  (2.3)  in  this  example  when 
x  =  (0.2,  0.5),  t  =  0.7  for  either  p  =  3/2,  nn  =  3/4  or  p  =  3  and  m  =  3.  We  can  see  from  these  figures 
that, in  general,  it  is  not  very  easy  for  Algorithm  1  to  be  trapped  in  a  stationary  point  other  than  the 
global  minimum.  Figure  11  show  the  projections  of  the  point  (0.1, 0.5)  to  the  respective  p-norm  balls  when 
d  =  2.  Table  4  and  Table  5  shows  the  average  computational  effort  for  computing  the  projection  to  the 
boundary  of  the  respective  p-norm  balls  from  10000  realizations  of  a  random  point  p  =  (1,  0.5)  x 
(where  di  are  some  random  numbers  in  (0,1))  using  Newton’s  Method  in  dimensions  d  =  2",  where  n 
ranges  1  to  5.  Because  of  the  fact  that  we  have  a  Newton  outer  loop  (to  solve  for  the  distance  function), 
an  ADMM  middle  loop  (to  evaluate  the  function  value  of  the  HJ  PDE)  and  a  Newton  inner  loop  which 
requires  the  inversion  of  a  dense  matrix  (to  calculate  the  proximal  map  of  J( x)),  the  algorithm  has  a 
slower  run-time  per  outer  iteration  than  in  the  previous  case  when  point  is  projected  to  an  ellipse.  Owing 
to  the  complex  shape  (as  well  as  the  non-convex  nature  of  the  problem),  this  problem  is  rather  difficult 
to  solve  numerically,  and  therefore  we  do  not  expect  an  instantaneous  run-time  as  in  the  previous  cases. 
Especially  when  p  grows  large,  we  observe  that  iteration  time  seems  to  slow  down;  see  Table  4  and  Table  5 
for  a  comparison.  However,  overall,  it  is  still  very  impressive  that  a  projection  of  a  point  to  the  boundary 
of  a  complex  shape  can  be  done  within  a  second  even  when  d  is  at  large  as  32. 


Figure  10:  Contour  curves  of  objective  functions  (2.3)  in  Example  3  with  x  =  (0.2,  0.5)  and  t  =  0.7.  when;  left: 
p  =  3/2  and  m  =  3/4;  right:  p  =  3  and  m  =  3.  Local  minima  are  marked  as  black  stars. 


Figure  11:  Projections  of  the  point  (0.1, 0.5)  from  interior  to  the  boundary  of  a,  left:  a  3/2-norm  ball;  and,  right: 
3-norm  ball. 
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Dimension 

Average  Number  of  (Outer)  Newton  Steps 

Average  Time  per  (Outer)  Iteration 

d  =  2 

5.0000 

2.2546  e-04  (sec) 

d  =  4 

4.2190 

5.2942  e-04  (sec) 

d  =  8 

5.0000 

1.6013e-03  (sec) 

d  =  16 

5.8914 

5.7806e-03  (sec) 

Si¬ 

ll 

CO 

to 

6.0000 

2.2575e-02  (sec) 

Table  4:  Average  computational  effort  to  evaluate  the  projection  of  10000  realizations  of  a  given  point  p  = 
(1,  0.5)  x  {di}^~2  (where  di  are  some  random  numbers  in  (0, 1))  to  the  boundary  of  the  a  3/2-norm  ball. 


Dimension 

Average  Number  of  (Outer)  Newton  Steps 

Average  Time  per  (Outer)  Iteration 

d  =  2 

9.0000 

1.04758e-03  (sec) 

d  =  A 

4.9931 

5.8386e-04  (sec) 

d  =  8 

4.9304 

2.0457e-03  (sec) 

d  =  16 

6.1698 

8.6849e-03  (sec) 

Si¬ 

ll 

CO 

to 

7.1514 

3.5360e-02  (sec) 

Table  5:  Average  computational  effort  to  evaluate  the  projection  of  10000  realizations  of  a  given  point  p  = 
(1,  0.5)  x  (where  di  are  some  random  numbers  in  (0, 1))  to  the  boundary  of  the  a  3-norm  ball. 

Example  4  Our  final  example  gives  a  very  interesting  solution  to  (2.1)  with  the  Hamiltonian  H (p)  = 
||Pi,2,...,z || 2  ^  ||pz+i,2,...,d||2)  where  pi,...,i  denotes  the  vector  with  first  l  coordinates  of  p,  and  likewise  for 
Pl+i,2,...,d-  Our  Hamiltonian  is  now  neither  convex  nor  concave.  This  example  can  be  regarded  as  a  case 
when  a  differential  game  is  considered.  The  solution  <p  is  expected  to  provide  contours  stretching  out  in  one 
direction  and  shrinking  in  another  direction.  We  again  consider  the  initial  value  as  J(x)  =  \{{x,  Ax)  —  1). 
As  in  Example  1,  we  choose  p  =  1  in  Algorithm  1.  Figure  12  shows  the  contours  of  the  objective  function 
(2.3)  in  this  example  when  x  =  (0.2,  0.5),  t  =  0.7  and  d  =  2.  The  local  minima  are  again  marked  as 
black  stars.  From  the  figure,  again  we  can  expect  that  it  is  not  very  easy  for  Algorithm  1  to  be  trapped 
in  a  stationary  point  other  than  the  global  minimum  in  general.  Figure  13  (top-left)  show  the  zero 
set  contours  of  the  solutions  4>{x,t)  =  0  computed  using  our  new  algorithm  where  t  =  0.1,  0.2. ..0.7  in 
dimensions  d  —  2.  In  this  example,  a  clear  comparison  is  performed  with  our  solution  to  the  solution  to 
Lax-Friedrichs  scheme.  A  first  order  Lax-Friedrichs  monotone  scheme  [24]  is  implemented  with  At  =  0.001 
and  Ax  =  0.005.  Figure  13  (top-right)  show  the  zero  set  contours  of  the  solutions  (f>(x,  t)  =  0  computed 
using  Lax-Friedrichs  where  t  =  0.1,  0.2. ..0.7  in  dimensions  d  =  2.  We  can  see  the  two  solutions  almost 
coincide.  However  differences  emerge  when  we  compare  in  detail  the  two  solutions.  Figure  13  (bottom) 
shows  an  enlarged  figure  of  the  zero  set  contours  of  the  two  computed  solutions  fi{x,t)  =  0.  We  can 
see  that  the  solution  shown  in  the  left,  i.e.  from  our  new  algoirthm,  has  sharp  corners  and  edges  in  the 
places  where  a  jump  in  gradient  is  established,  ie.  along  the  line  y  =  0;  whereas  that  in  the  right,  i.e. 
from  Lax-Friedrichs,  has  smooth  edges.  The  smoothing  of  solution  from  the  Lax-Friedrichs  is  well-known 
and  is  because  of  the  numerical  diffusion  introduced  in  the  scheme  to  keep  the  scheme  monotone.  This 
example  shows  clearly  a  very  strong  advantage  of  our  method:  being  able  to  effortlessly  capture  sharp 
discontinuities  in  derivative.  This  is  thanks  to  the  fact  that  the  Hopf  formula  provides  the  exact  solution, 
and  no  finite-difference  approximation  is  present  in  our  algorithm.  Table  6  shows  the  computational 
time  per  point  in  different  dimensions  d  and  l  specified  in  the  table  over  the  grid  points  (x,  t)  where 
x  G  {(-3  +  0.1  p,  -3  +  O.lg,  0, ...,  0)  :  p,  q  =  0, ...,  60}  C  [-3,  3]2  x  {0}6*"2  and  t  G  {0.1, 0.2... 0.7}.  We  can 
see  that  computational  effort  of  the  viscosity  solution  grows  very  slowly  w.r.t.  dimension  for  all  cases  of 
(,  and  it  is  very  efficient  considering  the  fact  that  a  solution  in  1024  dimensions  can  be  computed  nearly 
effortlessly  as  4  x  10~5  second  per  point.  All  in  all,  our  numerical  examples  show  very  low  run-time  to 
compute  viscosity  solution  in  very  high  dimensions,  and  therefore  our  new  algorithm  can  be  considered  as 
a  competitive  candidate  in  overcoming  the  curse  of  dimensionality  in  solving  non-convex  high-dimensional 
HJ  PDE. 
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Figure  12:  Contour  curves  of  objective  function  (2.3)  in  Example  4  with  x  =  (0.2, 0.5)  and  t  =  0.7  when 
d  =  2,  l  =  1.  Local  minima  are  marked  as  black  stars. 


Figure  13:  Zero  set  contours  of  ip(x,t)  =  0  to  the  differential  game  when  d  =  2,  with  t  =  0.1,  0.2,  ...,0.7  (from 
red  to  blue);  left:  Hopf-Lax;  right:  Lax-Friedrichs. 


Dimension 

l 

Average  computational  time  per  point 

d  =  2 

1 

8.9667e-07  (sec) 

d  =  4 

1 

1.0558e-06  (sec) 

d  =  4 

2 

9.5982e-07  (sec) 

d  =  8 

1 

9.2515e-07  (sec) 

d  =  8 

2 

9.2824e-07  (sec) 

d  =  8 

3 

1.2158e-06  (sec) 

d  =  8 

4 

9.2408e-07  (sec) 

d=  128 

64 

5.1779e-06  (sec) 

d  =  1024 

512 

3.5934e-05  (sec) 
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Table  6:  Computational  time  per  each  point  for  viscosity  solution  in  different  cases  of  d  and  l  over  the  grid  points 
(x,  t)  where  x  €  {(—3  +  O.lp,  —3  +  O.lg,  0, 0)  :  p,q  =  0,  ...,60}  C  [—3,  3]2  x  {0}d-2  and  t  €  {0.1,  0.2. ..0.7}. 


6  Concluding  Remarks 

In  this  work,  we  have  developed  a  new  algorithm  to  solve  a  vast  class  of  possibly  non-convex  and 
time  dependent  HJ-PDE  that  can  overcome  the  curse  of  dimensionality.  Thanks  to  the  Hopf  formula, 
our  method  is  ‘exact’  in  a  sense  that  no  finite  difference  approximation  is  involved.  Moreover,  it  can  be 
implemented  in  an  embarrasingly  parallel  fashion,  since  the  values  of  the  solution  between  neighboring 
points  are  fully  decoupled.  This  provides  a  very  promising  direction  for  the  search  of  an  algorithm  which 
shall  solve  a  general  HJ-PDE  (i.e.  possibly  x  dependent)  in  a  totally  parallel  way,  and  in  the  end  lead  us 
to  a  method  which  can  overcome  the  curse  of  dimensionality  in  solving  an  HJ-PDE  in  full  generality. 
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